Annotation of the Complex Terms in Multilingual Corpora

نویسندگان

  • Ismaïl Biskri
  • Boubaker Hamrouni
  • Nicole Munyana
چکیده

For a long time categorial grammars were regarded as "toys grammars". Indeed, in spite of a very solid theoretical base, categorial grammars remain rather marginal as soon as it is a question of conceiving concrete applications. However, this model of grammars has an unquestionable advantage compared to the majority of the other grammatical models: it is multilingual; multilingualism becoming, with the rise of the Web, one of the most significant constraints in the development of tools for natural language processing. In our article we show a multilingual approach for the extraction of the complex terms using a linguistic filter founded on a categorial model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-lingual and Multilingual Speech Emotion Recognition on English and French

Research on multilingual speech emotion recognition faces the problem that most available speech corpora differ from each other in important ways, such as annotation methods or interaction scenarios. These inconsistencies complicate building a multilingual system. We present results for crosslingual and multilingual emotion recognition on English and French speech data with similar characterist...

متن کامل

Towards a new level of annotation detail of multilingual speech corpora

The aim of this paper is to highlight the actual need for corpora that have been annotated based on acoustic information. The acoustic information should be coded in features or properties and is needed to inform further processing systems, i.e. to present a basis for a speech recognition system using linguistic information. Feature annotation of existing corpora in combination with segmental a...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Multilingual Corpora Annotation for Processing Definite Descriptions

This paper presents a multilingual corpora study aimed to verify the applicability of heuristics developed for coreference resolution in English texts to Portuguese and French language.

متن کامل

Semantic Annotation for Interlingual Representation of Multilingual Texts

This paper describes the annotation process being used in a multi-site project to create six sizable bilingual parallel corpora annotated with a consistent interlingua representation. After presenting the background and objectives of the effort, we describe the multilingual corpora and the three stages of interlingual representation being developed. We then focus on the annotation process itsel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006